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ABSTRACT 



We present an approach to improving the precision of an ini- 
tial document ranking wherein we utilize cluster information 
within a graph-based framework. The main idea is to per- 
form re-ranking based on centrality within bipartite graphs 
of documents (on one side) and clusters (on the other side) , 
on the premise that these are mutually reinforcing entities. 
Links between entities are created via consideration of lan- 
guage models induced from them. 

We find that our cluster-document graphs give rise to 
much better retrieval performance than previously proposed 
document-only graphs do. For example, authority-based 
re-ranking of documents via a HITS-style cluster-based ap- 
proach outperforms a previously-proposed PageRank-inspired 
algorithm applied to solely-document graphs. Moreover, we 
also show that computing authority scores for clusters con- 
stitutes an effective method for identifying clusters contain- 
ing a large percentage of relevant documents. 

Categories and Subject Descriptors: H.3.3 [Information Search 
and Retrieval]: Retrieval models 

General Terms: Algorithms, Experimentation 

KeyMvords: bipartite graph, clusters, language modeling, HITS, 
hubs, authorities, PageRank, high-accuracy retrieval, graph-based 
retrieval, structural re-ranking, cluster-based language models 

1. INTRODUCTION 

To improve the precision of retrieval output, especially 
within the very few (e.g, 5 or 10) highest-ranked documents 
that are returned, a number of researchers [361 1131 [161 [71 [22l 
1341 1251 [Ti l 181 [9] have considered a structural re-ranking strat- 
egy. The idea is to re-rank the top A'^ documents that some 
initial search engine produces, where the re-ordering uti- 
lizes information about inter-document relationships within 
that set. Promising results have been previously obtained 
by using document centrality within the initially retrieved 
list to perform structural re-ranking, on the premise that 
if the quality of this list is reasonable to begin with, then 
the documents that are most related to most of the docu- 
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ments on the list are likely to be the most relevant ones. In 
particular, in our prior work [18] we adapted PageRank [3] 
— which, due to the success of Google, is surely the most 
well-established algorithm for defining and computing cen- 
trality within a directed graph — to the task of re-ranking 
non-hyperlinked document sets. 

The arguably most well-known alternative to PageRank 
is Kleinberg's HITS algorithm [T5]. The major conceptual 
way in which HITS differs from PageRank is that it defines 
two different types of central items: each node is assigned 
both a hub and an authority score as opposed to a single 
PageRank score. In the Web setting, in which HITS was 
originally proposed, good hubs correspond roughly to high- 
quality resource lists or collections of pointers, whereas good 
authorities correspond to the high-quality resources them- 
selves; thus, distinguishing between two differing but inter- 
dependent types of Webpages is quite appropriate. Our pre- 
vious study |18| applied HITS to non-Web documents. We 
found that its performance was comparable to or better than 
that of algorithms that do not involve structural re-ranking; 
however, HITS was not as effective as PageRank [18| . 

Do these results imply that PageRank is better than HITS 
for structural re-ranking of non-Web documents? Not neces- 
sarily, because there may exist graph-construction methods 
that are more suitable for HITS. Note that the only enti- 
ties considered in our previous study were documents. If we 
could introduce entities distinct from documents but enjoy- 
ing a mutually reinforcing relationship with them, then we 
might better satisfy the spirit of the hubs-versus-authorities 
distinction, and thus derive stronger results utilizing HITS. 

A crucial insight of the present paper is that document 
clusters appear extremely well-suited to play this comple- 
mentary role. The intuition is that: (a) given those clus- 
ters that are "most representative" of the user's information 
need, the documents within those clusters are likely to be 
relevant; and (b) the "most representative" clusters should 
be those that contain many relevant documents. This appar- 
ently circular reasoning is strongly reminiscent of the inter- 
related hubs and authorities concepts underlying HITS. 

Also, clusters have long been considered a promising source 
of information. The well-known cluster hypothesis [35] en- 
capsulates the intuition that clusters can reveal groups of 
relevant documents; in practice, the potential utility of clus- 
tering for this purpose has been demonstrated for both the 
case wherein clusters were created in a query-independent 
fashion [Uli] and the re-ranking setting [T3l [22l [34] . 

In this paper, we show through an array of experiments 
that consideration of the mutual reinforcement of clusters 



and documents in determining centrality can lead to highly 
effective algorithms for re-ranking an initially retrieved list. 
Specifically, our experimental results show that the centrality- 
induction methods that we previously studied solely in the 
context of document-only graphs [T^ result in much better 
re-ranking performance if implemented over bipartite graphs 
of documents (on one side) and clusters (on the other side) . 
For example, ranking documents by their "authoritative- 
ness" as computed by HITS upon these cluster-document 
graphs yields better performance than that of a previously 
proposed PageRank implementation applied to document- 
only graphs. Interestingly, we also find that cluster author- 
ity scores can be used to identify clusters containing a large 
percentage of relevant documents. 

2. ALGORITHMS FOR RE-RANKING 

Since we are focused on the structural re-ranking paradigm, 
our algorithms are applied not to the entire corpus, but to a 
subset ©inlt (henceforth ©init), defined as the top A^ docu- 
ments retrieved in response to the query g by a given initial 
retrieval engine. Some of our algorithms also take into ac- 
count a set Cl{'Dinit) of clusters of the documents in ©init- 
We use 5init to refer generically to whichever set of entities 
— either ©init or ©init U Cl{'Dinit) — is used by a given 
algorithm. 

The basic idea behind the algorithms we consider is to 
determine centrality within a relevance-flow graph, defined 
as a directed graph with non-negative weights on the edges 
in which 

• the nodes are the elements of <Sinit, and 

• the weight on an edge between node u and v is based 
on the strength of evidence for v's relevance that would 
follow from an assertion that u is relevant. 

By construction, then, any measure of the centrality of s G 
iSinit should measure the accumulation of evidence for its rel- 
evance according to the set of interconnections among the 
entities in iSinit- Such information can then optionally be 
subjected to additional processing, such as integration with 
information on each item's similarity to the query, to pro- 
duce a final re-ranking of Oinit. 

Conventions regarding graphs. The types of relevance- 
flow graphs we consider can all be represented as weighted 
directed graphs of the form {V, wt), where V is a finite non- 
empty set of nodes and wt : V x V —> [0, cxa) is a non- 
negative edge-weight function. Note that thus our graphs 
technically have edges between all ordered pairs of nodes 
(self- loops included); however, edges with zero edge- weight 
are conceptually equivalent to missing edges. For clarity, we 
write wt{u —> v) instead of wt{u,v). 

2. 1 Hubs, authorities, and the HITS algorithm 

The HITS algorithm for computing centrality can be mo- 
tivated as follows. Let G — {V, wt) be the input graph, and 
let u be a node in V. First, suppose we somehow knew the 
hub score hub(it) of each node u £ V, where "hubness" is 
the extent to which the nodes that u points to are "good" 
in some sense. Then, v's authority score 



would be a natural measure of how "good" v is, since a node 
that is "strongly" pointed to by high-quality hubs (which, 
by definition, tend to point to "good" nodes) receives a high 
score. But where do we get the hub score for a given node u? 
A natural choice is to use the extent to which u "strongly" 
points to highly authoritative nodes: 



hub(u) = 2_, wtiu -^ v) ■ auth(u) 



(2) 



Clearly, Equations [T] and [2] are mutually recursive. However, 
the iterative HITS algorithrrQ provably converges to (non- 
identically-zero, non-negative) score functions hub* and auth* 
that satisfy the above pair of equations. 

Figure[T]depicts the "iconic" case in which the input graph 
G is one-way bipartite, that is, V can be partitioned into 
non-empty sets VLoft and Vaight such that only edges in 
Vi^cit X ^Flight can receive positive weight, and Vu £ VLeft, 
Vl ^,, wt(u —* v) > 0. It is the case that auth*(u) — 

for every u £ VLcft and hub*(u) = for every v G Vaight; 
in this sense, the left-hand nodes are "pure" hubs and the 
right-hand nodes are "pure" authorities. 




auth(i;) = 2_, wt{u — > v) ■ hub(u) 



(1) 



uev 



Figure 1: A one-way bipartite graph. We only show 
positive- weight edges (omitting weight values). Ac- 
cording to HITS, the left-hand nodes are (pure) 
hubs; the right-hand ones are (pure) authorities. 

Note that in the end, we need to produce a single cen- 
trality score for each node n £ V. For experimental sim- 
plicity, we consider only two possibilities in this paper — 
using auth*(n) as the final centrality score, or using hub*(n) 
instead — although combining the hub and authority scores 
is also an interesting possibility. 

2.2 Graph schemata: incorporating clusters 

Recall that the fundamental operation in our structural 
re-ranking paradigm is to compute the centrality of entities 
(with) in a set iSinit. One possibility is to define iSinit as 
©init, the documents in the initially retrieved set; we refer 
generically to any relevance-fiow graph induced under this 
choice as a document-to- document graph. But note that 
for non-Web documents, it may not be obvious a priori what 
kinds of documents are hubs and what kinds are authorities. 

Alternatively, we can define iSinit as ©initUCZ(I'init), where 
ClCDinit) consists of clusters of the documents in ©init. On 
a purely formal level, doing so allows us to map the hubs/au- 
thorities duality discussed above onto the documents/clusters 
duality, as follows. Recalling our discussion of the "iconic" 
case of one-way bipartite graphs G = {{VLett,Vmght),wt), 
we can create document- as- authority graphs simply by 
choosing Vteft = Cl{Vi^it) and Vaight = 2?init, so that neces- 
sarily clusters serve the role of (pure) hubs and documents 
serve the role of (pure) authorities. Contrariwiselj we can 

^Strictly speaking, the algorithm and proof of convergence 
as originally presented |16) need (trivial) modification to ap- 
ply to edge- weighted graphs. 
^In practice, one can simultaneously compute the output of 



create document- as -hub graphs by setting VLoft = 2?init 
and Vaight = C/(©init)- 

But the advantages of incorporating cluster-based infor- 
mation are not just formal. The well-known cluster hypoth- 
esis [3S] encapsulates the intuition that clusters can reveal 
groups of relevant documents; in practice, the potential util- 
ity of clustering for this purpose has been demonstrated a 
number of times, whether the clusters were created in a 
query-independent fashion |14l |4] , or from the initially most- 
highly-ranked documents for some query |13l 1221 I34j (i.e., 
in the re-ranking setting). Since central clusters are, sup- 
posedly, those that accrue the most evidence for relevance, 
documents that are strongly identified with such clusters 
should themselves be judged highly relevantljlj But identi- 
fying such clusters is facilitated by knowledge of which doc- 
uments are most likely to be relevant — exactly the mutual 
reinforcement property that HITS was designed to leverage. 

2.3 Alternative scores: PageRank and influx 

We will compare the results of using the HITS algorithm 
against those derived using PageRank instead. This is a nat- 
ural comparison because PageRank is the most well-known 
centrality-induction algorithm utilized for ranking documents, 
and because in earlier work [18], PageRank performed quite 
well as a tool for structural re-ranking of non-Web doc- 
uments, at least when applied to document-to-document 
graphs. 

One can think of PageRank as a version of HITS in which 
the hub/authority distinction has been collapsed. Thus, 
writing "PR" for both auth and hub, we conceptually have 
the (single) equation 



PR(u) = ^ wt{u -^ v) ■ PR(u) 

u6V 



(3) 



However, in practice, we incorporate Brin and Page's smooth- 
ing scheme [3] together with a correction for nodes with no 
positive- weight edges emanating from them [271121) : 



PR(^) = Y. 

ueV:out{u)>0 



\v\ 



(l-A) ^ ^ wtju^v) 



out(u) 



ueV:out(u)=0 
def 



\V\ 



PR{u) 
(4) 



where out{u) = X^^/g^ uit{u — » v'), and A G (0, 1) is the 
damping factor [j 



HITS for a given document-as-authority and document-as- 
hub graph pair by "overlaying" the two into a single graph 
and suitably modifying HITS's normalization scheme. 
^We say "are strongly identified with", as opposed to "be- 
long to" to allow for overlapping or probabilistic clusters. 
Indeed, the one-way bipartite graphs we construct are ill- 
suited to the HITS algorithm if document-to-cluster links 
are based on membership in disjoint clusters. 
*This is, in some sense, a type of smoothing: a document 
might be missing some of the query terms (perhaps due 
to synonymy), but if it lies within a sector of "document 
space" containing many relevant documents, it could still 
be deemed highly relevant. Recent research pursues this 
smoothing idea at a deeper level |25l [17] . 
^ Under the original "random surfer" model, the sum of the 
transition probabilities out of "no outflow" nodes — which 
are abundant in one-way bipartite graphs — would be (1 — 
A), not 1. Conceptually, the role of the second summation 
in Equation |4] is to set A = for these no-outfiow nodes. 



Equation|4]is recursive, but there are iterative algorithms 
that provably converge to the unique positive solution PR* 
satisfying the sum-normalization constraint "^Zvev ^^i'") ~ 
1 [21]. Moreover, a (non-trivial) closed-form — and quite 
easily computed — solution exists for one-way bipartite graphs: 



Theorem 1. If G = {V,wt) is one-way bipartite, then 



PRb^p{v) 



def 



E 



u£V:out(u)>0 



wt{u -^ v) 
out{u) 



(5) 



is an affine transformation (with respect to positive con- 
stants) of, and therefore equivalent for ranking purposes to, 
the unique positive sum-normalized solution to Equation\4\ 

(Proof omitted due to space constraints.) Interestingly, this 
result shows that while one might have thought that clusters 
and documents would "compete" for PageRank score when 
placed within the same graph, in our document-as-authority 
and document-as-hub graphs this is not the case. 

Earlier work [TS] also considered scoring a node v by its 
influx, X^uGV '"^(w ~* "y)- This can be viewed as either a 
non-recursive version of Equation O or as an un- normalized 
analog of Equation [5] 

2.4 Algorithms based on centrality scores 

Clearly, we can rank documents by their scores as com- 
puted by any of the functions introduced above. But when 
we operate on document-as-authority or document-as-hub 
graphs, centrality scores for the clusters are also produced. 
These can be used to derive alternative means for ranking 
documents. We follow Liu and Croft's approach [25]: first, 
rank the documents within (or most strongly associated to) 
each cluster according to the initial retrieval engine's scores; 
then, derive the final list by concatenating the within-cluster 
lists in order of decreasing cluster score, discarding repeats. 
Such an approach would be successful if cluster centrality is 
strongly correlated with the property of containing a large 
percentage of relevant documents. 

Ranking algorithms. Since we have two possible rank- 
ing paradigms, we adopt the following algorithm naming 
conventions. Names consist of a hyphen-separated prefix 
and suffix. The preflx ("doc" or "clust") indicates whether 
documents were ranked directly by their centrality scores, or 
indirectly through the concatenation process outlined above 
in which it is the clusters' centrality scores that were em- 
ployed. The suffix ("Auth", "Hub", "PR", or "Infiux") indi- 
cates which score function (auth*, hub*, PR* (or PRjip), or 
influx) was used to measure centrality. For a given re-rank- 
ing algorithm, we indicate the graph upon which it was run 
in brackets, e.g., "doc-Auth[G]". 

3. RELATED WORK 

The potential merits of query-dependent clustering, that 
is, clustering the documents retrieved in response to a query, 
have long been recognized [3UI 1361 [^ 1341 125| . especially in 
interactive retrieval settings [13II22|[32] . However, automat- 
ically detecting clusters that contain many relevant docu- 
ments remains a very hard task [36]. Section [5 . 21 presents re- 
sults for detecting such clusters using centrality-based clus- 
ter ranking. 



Recently, there has been a growing body of work on graph- 
based modehng for different language-processing tasks where- 
in links are induced by inter-entity textual similarities. Ex- 
amples include document (re-)ranking [71 1241 l9l 1181 139) . text 
summarization |lll 126) . sentence retrieval [2S], and docu- 
ment representation [10]. In contrast to our methods, links 
connect entities of the same type, and clusters of entities are 
not modeled within the graphs. 

While ideas similar to ours by virtue of leveraging the 
mutual reinforcement of entities of different types, or using 
bipartite graphs of such entities for clustering (rather than 
using clusters), are abundant (e.g., |15ll51[^l. we focus here 
on exploiting mutual reinforcement in ad hoc retrieval. 

Random walks (with early stopping) over bipartite graphs 
of terms and documents were used for query expansion [2D] , 
but in contrast to our work, no stationary solution was 
sought. A similar "short chain" approach utilizing bipar- 
tite graphs of clusters and documents for ranking an en- 
tire corpus was recently proposed [TO], thereby constituting 
the work most resembling ours. However, again, a station- 
ary distribution was not sought. Also, query drift preven- 
tion mechanisms were required to obtain good performance; 
in our re-ranking setting, we need not employ such mecha- 
nisms. 

4. EVALUATION FRAMEWORK 

Most aspects of the evaluation framework described be- 
low are adopted from our previous experiments with non- 
cluster-based structural re-ranking [TS] so as to facilitate 
direct comparison. Section 4.1 of [18] provides a more de- 
tailed justification of the experimental design. The main 
conceptual changes [j here are: a slightly larger parameter 
search-space for the "out-degree" parameter 5 (called the 
"ancestry" parameter a in [TS]); and, of course, the incor- 
poration of clusters. 

4.1 Graph construction 

Relevance flow based on language models (LMs). To 
estimate the degree to which one item, if considered rele- 
vant, can vouch for the relevance of another, we follow our 
previous work on document-based graphs [18] and utilize 
p\^ {■), the unigram Dirichlet-smoothed language model in- 
duced from a given document d {fj, is the smoothing pa- 
rameter) [38]. To adapt this estimation scheme to settings 
involving clusters, we derive the language model Pc {■) for 
a cluster c by treating c as the (large) document formed by 
concatenatingj its constituent (or most strongly associated) 
documents [TTl [25] [T9] . 

The relevance-flow measure we use is essentially a directed 
similarity in language-model space: 



rflow{x, y) — exp 



^D(p}^\- 



Py-\-))), (6) 



„Mc 



where D is the KuUback-Leibler divergence. The asymme- 
try of this measure corresponds nicely to the intuition that 
relevance flow is not symmetric [TS]. Moreover, this function 



Some of the PageRank results appearing in our previous 
paper [18) accidentally reflect experiments utilizing a sub- 
optimal choice of ©init. For citation purposes, the numbers 
reported in the current paper should be used. 
'^Concatenation order is irrelevant for unigram LMs. 



is somewhat insensitive to large length differences between 
the items in question [18] , which is advantageous when both 
documents and clusters (which we treat as very long docu- 
ments) are considered. 

Previous work [18II33J makes heavy use of the idea of near- 
est neighbors in language-model space. It is therefore conve- 
nient to introduce the notation Nbhd{x \m,R), pronounced 
"neighborhood", to denote the m items y within the "re- 
striction set" R that have the highest values of rflow{x, y) 
(we break ties by item ID, assuming that these have been 
assigned to documents and clusters). Note that the neigh- 
borhood of X corresponds to what we previously termed the 
"top generators" of x [18) . 

Graphs used in experiments. For a given set Anit of ini- 
tially retrieved documents and positive integer 5 (an "out- 
degree" parameter), we consider the following three graphs. 
Each connects nodes u to the 5 other nodes, drawn from 
some specified set, that u has the highest relevance flow to. 
The document-to-document graph d<-^d has vertex set 
©init and weight function 



wt 



d<->d 



{u, v) 



rflowiu, v) if w £ Nbhd{u \ 5, Oinit — {^}) , 
otherwise. 



The document-as-authority graph c— >d has vertex set DinitU 
Cl{T>i-nit) and a weight function such that positive-weight 
edges go only from clusters to documents: 



,c— »d/ X 

wt (u, V) = 



rflow{u,v) if M £ C/(I'init) and 

V e Nbhd{u\S, Vinit) 
otherwise. 



The document-as-hub graph d— >c has vertex set ©init U 
Cl{Vinit) and a weight function such that positive-weight 
edges go only from documents to clusters: 

rflowiu, v) if u € Oinit and 

V (^ Nhhd{u\S,Cl{Vi,,it)), 
otherwise. 



,d— >C/ N 

wt (u, v) 



Since the latter two graphs are one-way bipartite. Theo- 
rem [l] applies to them. 

Clustering Method. Clearly, our cluster-based graphs re- 
quire the construction of clusters of the documents in ©init- 
Since this set is query-dependent, at least some of the clus- 
tering process must occur at retrieval time, mandating the 
use of extremely efficient algorithms [5] [37] . The approach 
we adopt is to use overlapping nearest-neighbor clusters, 
which have formed the basis of effective retrieval algorithms 
in other work )12l 1171 [T^ 133) : for each document d € Oinit, 
we have the cluster {d} U Nbhd{d | fc — 1, Dinit — {d}), where 
k is the cluster-size parameter. 

4.2 Experimental Setup 

We conducted our experiments on three TREC datasets: 



corpus 


# of docs 


queries 


disk(s) 


AP 


242,918 


51-64, 66-150 


1-3 


TREC8 


528,155 


401-450 


4-5 


WSJ 


173,252 


151-200 


1-2 



We applied basic tokenization and Porter stemming via the 
Lemur toolkit (www.lemurproject.org), which we also used 
for language-model induction. Topic titles served as queries. 





AP 


TREC8 


WSJ 1 


precQS 


precOlO 


MRR 


precOS 


precOlO 


MRR 


precOS 


precOlO 


MRR 


doc-Auth[d«^dJ 


.509 


.486 


.638 


.440 


.424 


.648 


.504 


.464 


.638 


doc-PagcRank[d^-»dJ 


.519 


.480 


.632 


.524 


.446 


.666 


.536 


.486 


.699 


doc-Auth[c-+dJ 


.541 


.501 P 


.669 P 


.544 " 


.452 


.674 


.564 " 


.514 " 


.746 " 



Table 1: Main comparison: HITS or PageRank on document-only graphs versus HITS on cluster-to-document 
graphs. Bold: best results per column. Symbols "p" and "a": doc-Auth[c^d] result differs significantly from 
that of doc-PageRank[d^^d] or doc-Auth[d^->d], respectively. 



In many retrieval situations of interest, ensuring that the 
top few documents retrieved (a.k.a., "the first page of re- 
suits" ) tend to be relevant is much more important than en- 
suring that we assign relatively high ranks to the entire set of 
relevant documents in aggregate [31]. Hence, rather than use 
mean average precision (MAP) as an evaluation metric, we 
apply metrics more appropriate to the structural re-ranking 
task; precision at the top 5 and 10 documents (henceforth 
prec@5 and prec@10, respectively) and the mean reciprocal 
rank (MRR) of the first relevant document [31] . All perfor- 
mance numbers are averaged over the set of queries for a 
given corpus. 

The natural baseline for the work described here is the 
standard language-model-based retrieval approach [291 [5], 
since it is an effective paradigm that makes no explicit use of 
inter-document relationships. Specifically, for a given eval- 
uation metric e, the corresponding optimized baseline is the 
ranking on documents produced by pjf (g), where n{e) is 
the value of the Dirichlet smoothing parameter that results 
in the best retrieval performance as measured by e. 

A ranking method might assign different items the same 
score; we break such ties by item ID. Alternatively, the 
scores used to determine ©init can be utilized, if available. 

Parameter selection for graph-based methods. There 
are two motivations underlying our approach to choosing 
values for our algorithms' parameters [18] . 

First, we hope to show that structural re-ranking can 
provide better results than the optimized baselines even 
when initialized with a sub-optimal (yet reasonable) rank- 
ing. Hence, let the initial ranking be the document ordering 
induced on the entire corpus by pj*^^"™ (g), where ^looo is 
the smoothing-parameter value optimizing the average non- 
interpolated precision of the top 1000 documents. We set 
Oinit to the top 50 documents in the initial ranking. 

Second, we wish to show that good results can be achieved 
without a great deal of parameter tuning. Therefore, we did 
not tune the smoothing parameter for any of the language 
models used to determine graph edge- weights, but rather 
simply set ^ = 2000 when smoothing was required, following 
a prior suggestion [38]. Also, the other free parameters' val- 
ues were chosen so as to optimize prec@5, regardless of the 
evaluation metric under consideration^ As a consequence, 
our prec@10 and MRR results are presumably not as high 
as possible; but the advantage of our policy is that we can 
see whether optimization with respect to a fixed criterion 
yields good results no matter how "goodness" is measured. 



Parameter values were selected from the following sets. 
The graph "out-degree" 5: {2,4,9,19,29,39,49}. The clus- 
ter size k: {2, 5, 10, 20, 30}. The PageRank damping factor 
A: {0.05, 0.1... 0.9, 0.95}. 

5. EXPERIMENTAL RESULTS 

In what follows, when we say that results or the difference 
between results are "significant" , we mean according to the 
two-sided Wilcoxon test at a confidence level of 95%. 

5.1 Re-Ranking by Document Centrality 

Main result. We first consider our main question; can we 
substantially boost the effectiveness of HITS by applying it 
to cluster-to-document graphs, which we have argued are 
more suitable for it than the document-to-document graphs 
we constructed in our previous work [TS]? The answer, as 
shown in Table [ij is clearly "yes" : we see that moving to 
cluster-to-document graphs results in substantial improve- 
ment for HITS, and indeed boosts its results over those for 
PageRank on document-to- document graphs. 



If two different parameter settings yield the same prec@5, 
we choose the setting minimizing prec@10 so as to provide 
a conservative estimate of expected performance. Similarly, 
if we have ties for both prec@5 and precQlO, we choose the 
setting minimizing MRR. 



Full suite of comparisons. We now turn to Figure [2] 
which gives the results for the re-ranking algorithms doc- 
Influx, doc-PageRank and doc-Auth as applied to either 
the document-based graph d<->d (as in [18]) or the cluster- 
document graph c^d. (Discussion of doc-Hub is deferred 
to Section [5. 3n 

To focus our discussion, it is useful to first point out that 
in almost all of our nine evaluation settings (3 corpora x 3 
evaluation measures) , all three of the re-ranking algorithms 
perform better when applied to c— >d graphs than to d<-^d 
graphs, as the number of dark bars in Figure [2] indicates. 
Since it is thus clearly useful to incorporate cluster-based 
information, we will now mainly concentrate on c^d-based 
algorithms. 

The results for prec@5, the metric for which the re-ranking 
algorithms' parameters were optimized, show that all c— >d- 
based algorithms outperform the prec(a)5-optimized baseline 
— significantly so for the AP corpus — even though applied 
to a sub-optimally-ranked initial set. (We hasten to point 
out that while the initial ranking is always inferior to the 
corresponding optimized baseline, the differences are never 
significant.) In contrast, the use of d<-»d graphs never leads 
to significantly superior prec@5 results. 

We also observe in Figure [2] that the doc-Auth[c— >d] al- 
gorithm is always either the best of the c^d-based algo- 
rithms or clearly competitive with the best. Furthermore, 
pairwise comparison of it to each of the doc-Influx[c— >d] 
and doc-PageRank[c— »d] algorithms favors the HITS-style 
doc-Auth[c^d] algorithm in a majority of the evaluation 
settings. 
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in the re-ranking setting has been acknowledged to be a hard 
task for some time [36]. Nevertheless, as stated in Section 
12.41 we experimented with Liu and Croft's general clusters- 
for-selection approach [25] : rank the clusters, then rank the 

Our baseline algo- 
rithm, clust-pc'''(g), adopts Liu and Croft's specific proposal 
of the CQL algorithm — except that we employ overlapping 
rather than hard clusters — wherein clusters are ranked by 
the query likelihood pc (q) instead of one of our centrality 
scores. 

Table [5] (which may appear on the next page) presents the 
performance results. Our first observation is that the clust- 
Influx[d— >c] and clust-Auth[d— »c] algorithms are superior in 
a majority of the relevant comparisons to the initial rank- 
ing, the optimized baselines, and the clust-pc (q) algorithm, 
where the performance differences with the latter sometimes 
achieve significance. 

However, the performance of the document-centrality-based 
algorithm doc-Auth[c— >d] is better in a majority of the eval- 
uation settings than that of any of the cluster-centrality- 
based algorithms. On the other hand, it is possible that the 
latter methods could be improved by a better technique for 
within-cluster ranking. 

To compare the effectiveness of clust-Influx[d— >c] and clust- 

Auth[d— >c] to that of clust-pc (?) in detecting clusters with 
a high percentage of relevant documents — thereby neutral- 
izing within-cluster ranking effects — we present in Table [3] 
the percent of documents in the highest ranked cluster that 
are relevant. (Cluster size (fc) was fixed to either 5 or 10 
and out-degree (5) was chosen to optimize the above per- 
centage.) Indeed, these results clearly show that our best 
cluster-based algorithms are much better than clust-pc (q) 
in detecting clusters containing a high percentage of relevant 
documents, in most cases to a significant degree. 
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Figure 2: All re-ranking algorithms, as applied to 
either d^^d graphs or c^d graphs. 



We also experimented with a few alternate graph-construction 
methods, such as sum-normalizing the weights of edges out 
of nodes, and found that the doc-Auth[c— >d] algorithm re- 
mained superior to doc-Influx [c—>d] and doc-PageRank[c— >d]. 
We omit these results due to space constraints. 

All in all, these findings lead us to believe that not only is 
it useful to incorporate information from clusters, but it can 
be more effective to do so in a way refiecting the mutually- 
reinforcing nature of clusters and documents, as the HITS 
algorithm does. 

5.2 Re-Ranking by Cluster Centrality 

We now consider the alternative, mentioned in Section [2^ 
of using the centrality scores for clusters as an indirect means 
of ranking documents, in the sense of identifying clusters 
that contain a high percentage of relevant documents. Note 
that the problem of automatically identifying such clusters 



Table 3: Average relevant-document percentage 
virithin the top-ranked cluster, k: cluster size. Bold: 
best results per column, c: result differs signifi- 
cantly from that of clust-pc (g), used in [25 1 . 



5.3 Further Analysis 

Authorities versus hubs. So far, we have only considered 
utilizing the authority scores that the HITS algorithm pro- 
duces. The chart below shows the effect of ranking enti- 
ties by hub scores instead. Specifically, the "documents?" 
column compares doc-Auth[c— >d] (i.e., ranking documents 
by authoritativeness) to doc-Hub[d— >c] (i.e., ranking docu- 
ments by hubness); similarly, the "clusters?" column com- 
pares clust-Auth[d^c] to clust-Hub[c— >d]. Each entry de- 
picts, in descending order of performance (except for the 
one indicated tie) as one moves left to right, those central- 
ity scoring functions that lead to an improvement over the 
initial ranking: A stands for "authority" and H for "hub". 
Cases in which the improvement is significant are marked 
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Table 2: Cluster-based re-ranking. Bold: best results per column. Symbols i, o, c: results differ significantly 
from the initial ranking, optimized baseline, or (for the re-ranking algorithms) clust-pL (?) |25| . respectively. 
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We see that in many cases, hub-based re-ranking does 
yield better performance than the initial ranking. But authority- 
based re-ranking appears to be an even better choice overall. 

HITS on PageRank-Style graphs. Consider our compari- 
son of doc-Auth[d^>d] against doc-PageRank[d<^d]. As the 
notation suggests, this corresponds to running HITS and 
PageRank on the same graph, d<->d. But an alternative in- 
terpretation [18] is that non-smoothed (or no-random-jump) 
PageRank, as expressed by Equation Q, is applied to a 
different version of d<->d wherein the original edge weights 
«;t(it -^ v) have been smoothed as follows: 
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(we ignore nodes with no positive- weight out-edges to sim- 
plify discussion, and omit the d<-»d superscripts for clarity) . 

How does HITS perform on document-to-document graphs 
that are "truly equivalent", in the sense of employing the 
above edge-weighting regime, to those that PageRank is 
applied to? One reason this is an interesting question is 
that HITS assigns scores of zero to nodes that are not in 
the graph's largest connected component (with respect to 
positive- weight edges, considered to be bi-directional). No- 
tice that the original graph may have several connected com- 
ponents, whereas utilizing wv^ ensures that each node has a 
positive- weight directed edge to every other node. Addition- 
ally, the re-weighted version of HITS has provable stability 
properties [27j . 

We found that in nearly all of our evaluation settings for 
document-to-document graphs (three corpora x three eval- 
uation metrics), doc-Auth[d^->d] achieved better results us- 
ing uii'"^' edge weights. However, we cannot discount the 



possibility that the performance differences might be due 
simply to the inclusion of the extra interpolation-parameter 
A. Moreover, in all but one case, the improved results were 
still below those for doc-PageRank[d^^d] (and always lagged 
behind those of doc-Auth[c— >d]). 

Interestingly, the situation is qualitatively different if we 
consider c^d graphs instead. In brief, we applied a smooth- 
ing scheme analogous to that described above, but only to 
edges leading from a left-hand node (cluster) to a right-hand 
node (documentjj; we thus preserved the one-way bipar- 
tite structure. Only in two of the nine evaluation settings 
did this change cause an increase in performance of doc- 
Auth[c^d] over the results attained under the original edge- 
weighting scheme, despite the fact that the re-weighting in- 
volves an extra free parameter. Thus, while we have al- 
ready demonstrated in previous sections of this paper that 
information about document-cluster similarity relationships 
is very valuable, the results just mentioned suggest that such 
information is more useful in "raw" form. 

Re-anchoring to the query. In previous work, we showed 
that PageRank centrality scores induced over document- 
based graphs can be used as a multiplicative weight on 
document query-likelihood terms, the intent being to cope 
with cases in which centrality in ©init and relevance are not 
strongly correlated [18]. Indeed, employing this technique 
on the AP, TREC8, and WSJ corpora, prec@5 increases 
from .519, .524 and .536, to .531, .56 and .572 respectively. 

The same modification could be applied to the c^d-based 
algorithms, although it is not particularly well-motivated in 
the HITS case. While PageRank scores correspond to a 
stationary distribution that could be loosely interpreted as 
a prior [18], in which case multiplicative combination with 
query likelihood is sensible, it is not usual to assign a prob- 
abilistic interpretation to hub or authority scores. 

Nonetheless, for the sake of comparison completeness, we 
applied this idea to the doc-Auth[c— >d] algorithm, yield- 
ing the following performance changes: from .541, .544, and 
.564 to .537, .572 and .572 respectively. These results are 
still as good as — and for two corpora better than — those 
for PageRank as a multiplicative weight on query likelihood. 
Thus, it may be the case that centrality scores induced over a 
document-based graph are more effective as a multiplicative 
bias on query-likelihood than as direct representations of rel- 
evance in Dinit (see also [18j): but, modulo the caveat above, 
it seems that when centrality is induced over cluster-based 



^In the one-way bipartite case, the "IV"!" in Equation ([7)) 
must be changed to the number of right-hand nodes. 



one-way bipartite graphs, the correlation with relevance is 
much stronger, and hence this kind of centrality serves as a 
better "bias" on query- likelihood. 

6. CONCLUSION 

We have shown that leveraging the mutually reinforcing 
relationship between clusters and documents to determine 
centrality is very beneficial not only for directly finding rel- 
evant documents in an initially retrieved list, but also for 
finding clusters of documents from this list that contain a 
high number of relevant documents. 

Specifically, we demonstrated the superiority of cluster- 
document bipartite graphs to document-only graphs as the 
input to centrality-induction algorithms. Our method for 
finding "authoritative" documents (or clusters) using HITS 
over these bipartite graphs results in state-of-the-art perfor- 
mance for document (and cluster) re-ranking. 
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